Summary and discussion of: “Dropout Training as Adaptive Regularization”

نویسنده

  • Calvin Murdock
چکیده

Multi-layered (i.e. deep) artificial neural networks have recently undergone a resurgence in popularity due to improved processing capabilities and the increasing availability of large datasets. Popular in the 1980’s and before, they were largely abandoned in favor of convex methods (such as support vector machines) that came with optimality guarantees and often gave better results in far less time. However, with the modern ubiquity of GPUs and distributed computing, the same methods that were once spurned for their computational intractability have become the de-facto standard for large-scale commercial applications in companies such as Google, Facebook, Baidu, etc. This almost universal adoption of deep learning (as it’s now called) is not without reason; variants of these methods have achieved state-of-the-art performance (often by a significant margin over other competing algorithms) on numerous tasks such as image classification, speech recognition, etc. While improved computational power has allowed these models to be trained in a moderate amount of time, their performance is tied to the quantity and quality of the available training data. Deep neural networks are often very large, often consisting upwards of millions (or even billions) of parameters whose values must all be learnt from data. Thus, like any statistical model, they are susceptible to overfitting and typically require huge quantities of labeled training examples. Furthermore, the optimization landscape for the network parameters is highly non-convex with many local optima. So even with sufficient training data, the performance of a trained model could still be poor. These problems have been addressed empirically by a surprisingly simple method referred to as dropout. First introduced by Geoffrey Hinton, dropout has become an important component in most modern deep learning implementations because it results in reduced overfitting and often reaches better local minima. Even so, very little is understood theoretically about why this should be the case. The paper Dropout Training as Adaptive Regularization is one of several recent papers that attempts to understand the role of dropout in training deep neural networks.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Dropout Training as Adaptive Regularization

Dropout and other feature noising schemes control overfitting by artificially corrupting the training data. For generalized linear models, dropout performs a form of adaptive regularization. Using this viewpoint, we show that the dropout regularizer is first-order equivalent to an L2 regularizer applied after scaling the features by an estimate of the inverse diagonal Fisher information matrix....

متن کامل

The dropout learning algorithm

Dropout is a recently introduced algorithm for training neural network by randomly dropping units during training to prevent their co-adaptation. A mathematical analysis of some of the static and dynamic properties of dropout is provided using Bernoulli gating variables, general enough to accommodate dropout on units or connections, and with variable rates. The framework allows a complete analy...

متن کامل

Compacting Neural Network Classifiers via Dropout Training

We introduce dropout compaction, a novel method for training feed-forward neural networks which realizes the performance gains of training a large model with dropout regularization, yet extracts a compact neural network for run-time efficiency. In the proposed method, we introduce a sparsity-inducing prior on the per unit dropout retention probability so that the optimizer can effectively prune...

متن کامل

Dropout Training of Matrix Factorization and Autoencoder for Link Prediction in Sparse Graphs

Matrix factorization (MF) and Autoencoder (AE) are among the most successful approaches of unsupervised learning. While MF based models have been extensively exploited in the graph modeling and link prediction literature, the AE family has not gained much attention. In this paper we investigate both MF and AE’s application to the link prediction problem in sparse graphs. We show the connection ...

متن کامل

Shakeout: A New Regularized Deep Neural Network Training Scheme

Recent years have witnessed the success of deep neural networks in dealing with a plenty of practical problems. The invention of effective training techniques largely contributes to this success. The so-called "Dropout" training scheme is one of the most powerful tool to reduce over-fitting. From the statistic point of view, Dropout works by implicitly imposing an L2 regularizer on the weights....

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014